Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Playback speech detection algorithm based on modified cepstrum feature
LIN Lang, WANG Rangding, YAN Diqun, LI Can
Journal of Computer Applications    2018, 38 (6): 1648-1652.   DOI: 10.11772/j.issn.1001-9081.2017112822
Abstract518)      PDF (932KB)(297)       Save
With the development of speech technology, various kinds of phishing speech represented by playback speech have brought serious challenge for voiceprint authentication system and audio forensics technology. Aiming at the attack problem of playback speech to voiceprint authentication system, a new detection algorithm based on modified cepstrum feature was proposed. Firstly, the coefficient of variation was used to analyze the difference between the original speech and the playback speech in the frequency domain. Secondly, a new filter bank composed of inverse-Mel filters and linear filters was used to replace Mel filter bank in the process of extracting Mel Frequency Cepstral Coefficients (MFCC) pertinently, and then the modified cepstrum feature based on the new filter bank was obtained. Finally, Gaussian Mixture Model (GMM) was utilized as the classifier to classify and discriminate speech. The experimental results show that, the modified cepstrum feature can effectively detect the playback speech, and its equal error rate is about 3.45%.
Reference | Related Articles | Metrics
Recaptured speech detection algorithm based on convolutional neural network
LI Can, WANG Rangding, YAN Diqun
Journal of Computer Applications    2018, 38 (1): 79-83.   DOI: 10.11772/j.issn.1001-9081.2017071896
Abstract531)      PDF (838KB)(379)       Save
Aiming at the problems that recaptured speech attack to speaker recognition system harms the rights and interests of legitimate users, a recaptured speech detection algorithm based on Convolutional Neural Network (CNN) was proposed. Firstly, the spectrograms of the original speech and the recaptured speech were extracted and input into the CNN for feature extraction and classification. Secondly, for the detection task, a new network architecture was constructed, and the effect of the spectrograms with different window shifts were discussed. Finally, the cross-over experiments for various recapture and replay devices were constructed. The experimental results demonstrate that the proposed method can accurately discriminate whether the detected speech is recaptured or not, and the recognition rate achieves 99.26%. Compared with the mute segment Mel-Frequency Cepstral Coefficient (MFCC) algorithm, channel mode noise algorithm and long window scale factor algorithm, the recognition rate is increased by about 26 percentage points, about 21 percentage points and about 0.35 percentage points respectively.
Reference | Related Articles | Metrics